1,802 research outputs found

    Automatic speech recognition system development in the “wild“

    Get PDF
    The standard framework for developing an automatic speech recognition (ASR) system is to generate training and development data for building the system, and evaluation data for the final performance analysis. All the data is assumed to come from the domain of interest. Though this framework is matched to some tasks, it is more challenging for systems that are required to operate over broad domains, or where the ability to collect the required data is limited. This paper discusses ASR work performed under the IARPA MATERIAL program, which is aimed at cross-language information retrieval, and examines this challenging scenario. In terms of available data, only limited narrow-band conversational telephone speech data was provided. However, the system is required to operate over a range of domains, including broadcast data. As no data is available for the broadcast domain, this paper proposes an approach for system development based on scraping "related" data from the web, and using ASR system confidence scores as the primary metric for developing the acoustic and language model components. As an initial evaluation of the approach, the Swahili development language is used, with the final system performance assessed on the IARPA MATERIAL Analysis Pack 1 data.The Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via Air Force Research Laboratory (AFRL

    Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

    Get PDF
    Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available. This paper examines this limited resource scenario for confidence estimation, a measure commonly used to assess transcription reliability. In particular, it explores what other sources of word and sub-word level information available in the transcription process could be used to improve confidence scores. To encode all such information this paper extends lattice recurrent neural networks to handle sub-words. Experimental results using the IARPA OpenKWS 2016 evaluation system show that the use of additional information yields significant gains in confidence estimation accuracy. The implementation for this model can be found online.Comment: 5 pages, 8 figures, ICASSP submissio

    Low-resource speech recognition and keyword-spotting

    Get PDF
    © Springer International Publishing AG 2017. The IARPA Babel program ran from March 2012 to November 2016. The aim of the program was to develop agile and robust speech technology that can be rapidly applied to any human language in order to provide effective search capability on large quantities of real world data. This paper will describe some of the developments in speech recognition and keyword-spotting during the lifetime of the project. Two technical areas will be briefly discussed with a focus on techniques developed at Cambridge University: the application of deep learning for low-resource speech recognition; and efficient approaches for keyword spotting. Finally a brief analysis of the Babel speech language characteristics and language performance will be presented

    Hyperspectral imaging to measure apricot attributes during storage

    Get PDF
    The fruit industry needs rapid and non-destructive techniques to evaluate the quality of the products in the field and during the post-harvest phase. The soluble solids content (SSC), in terms of °Brix, and the flesh firmness (FF) are typical parameters used to measure fruit quality and maturity state. Hyperspectral imaging (HSI) is a powerful technique that combines image analysis and infrared spectroscopy. This study aimed to evaluate the potential of the application of the Vis/NIR push-broom hyperspectral imaging (400 to 1000 nm) to predict the firmness and the °Brix in apricots (180 samples) during storage (11 days). Partial least squares (PLS) and artificial neural networks (ANN) were used to develop predictive models. For the PLS, R2 values (test set) up to 0.85 (RMSEP=1.64 N) and 0.72 (RMSEP=0.51 °Brix) were obtained for the FF and SSC, respectively. Concerning the ANN, the best results in the test set were achieved for the FF (R2=0.85, RMSEP=1.50 N). The study showed the potential of the HSI technique as a non-destructive tool for measuring apricot quality even along the whole supply chain

    Incorporating uncertainty into deep learning for spoken language assessment

    Get PDF
    There is a growing demand for automatic assessment of spoken English proficiency. These systems need to handle large vari- ations in input data owing to the wide range of candidate skill levels and L1s, and errors from ASR. Some candidates will be a poor match to the training data set, undermining the validity of the predicted grade. For high stakes tests it is essen- tial for such systems not only to grade well, but also to provide a measure of their uncertainty in their predictions, en- abling rejection to human graders. Pre- vious work examined Gaussian Process (GP) graders which, though successful, do not scale well with large data sets. Deep Neural Networks (DNN) may also be used to provide uncertainty using Monte-Carlo Dropout (MCD). This paper proposes a novel method to yield uncertainty and compares it to GPs and DNNs with MCD. The proposed approach explicitly teaches a DNN to have low uncertainty on train- ing data and high uncertainty on generated artificial data. On experiments conducted on data from the Business Language Test- ing Service (BULATS), the proposed ap- proach is found to outperform GPs and DNNs with MCD in uncertainty-based re- jection whilst achieving comparable grad- ing performance

    Inflectional loci of scrolls

    Full text link
    Let X⊂PNX\subset \mathbb P^N be a scroll over a smooth curve CC and let \L=\mathcal O_{\mathbb P^N}(1)|_X denote the hyperplane bundle. The special geometry of XX implies that some sheaves related to the principal part bundles of \L are locally free. The inflectional loci of XX can be expressed in terms of these sheaves, leading to explicit formulas for the cohomology classes of the loci. The formulas imply that the only uninflected scrolls are the balanced rational normal scrolls.Comment: 9 pages, improved version. Accepted in Mathematische Zeitschrif
    • …
    corecore